Investigation of the Lambda Parameter for Language Modeling Based Persian Retrieval
نویسندگان
چکیده
Language modeling is one of the most powerful methods in information retrieval. Many language modeling based retrieval systems have been developed and tested on English collections. Hence, the evaluation of language modeling on collections of other languages is an interesting research issue. In this study, four different language modeling methods proposed by Hiemstra [1] have been evaluated on a large Persian collection of a news archive. Furthermore, we study two different approaches that are proposed for tuning the Lambda parameter in the method. Experimental results show that the performance of language models on Persian text improves after Lambda Tuning. More specifically Witten Bell method provides the best results.
منابع مشابه
External Plagiarism Detection based on Human Behaviors in Producing Paraphrases of Sentences in English and Persian Languages
With the advent of the internet and easy access to digital libraries, plagiarism has become a major issue. Applying search engines is one of the plagiarism detection techniques that converts plagiarism patterns to search queries. Generating suitable queries is the heart of this technique and existing methods suffer from lack of producing accurate queries, Precision and Speed of retrieved result...
متن کاملComparative Study of Degree of Bilingualism in Lexical Retrieval and Language Learning Strategies
This study compares lexical retrieval amongst monolinguals and intermediate bilinguals and advanced bilinguals. It also investigates the possible effects of their language learning strategies on their respective lexical retrieval advantage. The study used a mixed methods design and the groups consisted of 20 Persian near-monolinguals, 20 Persian-English intermediate level bilinguals, and 20 Per...
متن کاملLexical Access in Persian Speaking Children With and Without Specific Language Impairment
Introduction: Word retrieval problems are among the limitations observed in children with specific language impairment during the initial schooling years. These restrictions are predictive of reading problems and poor performance at school. Additionally, studies on lexical access in Persian speaking children are scarce. Therefore, this study aimed to investigate and compare naming accuracy and ...
متن کاملCorpus-Based Insights into Modeling a Level-Specific Persian Language Proficiency Test (PLPT): Development and Factor Structure of the PLPT Listening Tasks
--
متن کاملبررسی تأثیرات ریشهیابی در بازیابی اطلاعات در زبان فارسی
Using the language-specific behavior in information retrieval systems can improve the quality of the retrieved results significantly. Part of the word that remains after removing its affixes is called stem. Stemming process can be used for improving the relevancy of the results in information retrieval system. Different morphological variants of words (plural, past tense…) will be mapped into t...
متن کامل